Practical Computing Skills for Omics Data (PLNTPTH 5004)
MCIC Wooster, Ohio State University
2025-08-26
Background in animal evolutionary genomics & speciation
In my free time, I enjoy bird watching – locally & all across the world
TBA
Name
Lab and Department
Research interests and/or current research topics
Something about you that is not work-related, such as a hobby or fun fact
Learning skills that will enable you to:
Do your research more reproducibly and efficiently (e.g. by using code)
Work with large-scale “omics” datasets and do applied bioinformatics
To do so, this course will focus primarily on what we may call “foundational computational skills” rather than on specific applications. For example, you will learn to:
Two related ideas:
Our focus is on #2.
“The most basic principle for reproducible research is: Do everything via code.”
—Karl Broman
Additionally, also important for reproducibility are:
Another motivator: working reproducibly will benefit future you!
Using code enables you to work more efficiently and automatically —
particularly useful when having to:
Omics data is increasingly important in biology, and most notably includes the study of:
The next lecture will introduce omics data in more detail.
Examples in the course will involve the analyses of nucleotide sequencing-based data (genomics and transcriptomics), which in many cases can be divided into two subsequent stages:
What this course does and does not focus on
While we’ll be using some example omics datasets, this course will not comprehensively cover specific omics analyses — our focus is much more on foundational computational skills.
A highly recommended follow-up course to learn omics analysis specifics:
Genome Analytics (HCS 7004) by Jonathan Fresnedo-Ramirez
Also: computational biology
TBA
The Unix shell (or the “Terminal”) is a command-line interface to computers.
Being able to use the Unix shell is a fundamental skill when working with omics data, for example because many of the specialized analysis software must be run using the shell.
Bash (shell language)
VS Code
Good project organization & documentation is a necessary starting point for reproducible research.
You’ll learn best practices for project organization, file naming, etc.
You’ll learn how to manage your data and software
To document and report what you are doing, you’ll use Markdown files.
Markdown
Using version control, you can more effectively keep track of project progress, collaborate, share code, revisit earlier versions, and undo.
Git is the version control software we will use,
and GitHub is the website that hosts Git projects (repositories).
You’ll also use Git + GitHub to hand in your graded assignments.
Thanks to supercomputer resources, you can work with very large datasets at speed — running up to 100s of analyses in parallel, and using much larger amounts of memory and storage space than a personal computer has.
Omics data analyses typically consist of many consecutive steps.
Using a workflow written with a workflow manager, you can run and rerun an entire analysis pipeline with a single command (and much more).
While the Unix shell, and software that is run through the Unix shell, is best used for the initial (algorithmic) processing steps of omics data, R is probably the most prominent language in the more “downstream” and often statistical analysis and visualization of omics data.
In this course, you will learn the basics of R, how to visualize data in R, and how you can use specialized packages for omics data analyis.
R vs. Python
Python is also commonly used but I believe that altogether, R is a far better choice for this course. On the other, Python is a great follow-up language to learn for those seeking to specialize in bioinformatics.
Be muted by default, but feel free to unmute yourself to ask questions any time.
Questions can also be asked in the chat.
Having your camera turned on as much as possible is appreciated!
“Screen real estate” — large/multiple monitors or multiple devices best.
Be ready to share your screen.
TBA
TBA
Most weeks, additional readings are optional because you are always expected to reread and practice more with the material that we go through in class.
Additionally, anything on a given page that we do not get to in class automatically turns into required self-study material.
Several weeks will have 1 or 2 papers as required reading.
Most weeks have as optional readings chapters from the following two books:
TBA
TBA
You can earn a total of 100 points across 6 assignments and 4 final project checkpoints.
These are due on Mondays and are worth 10 points each:
The first one is submitted through CarmenCanvas, while all others are submitted via GitHub so you can get more practice with that.
Plan and implement a small computational project, with the following checkpoints:
I: Proposal (due week 13 – 5 points)
II: Draft (due week 15 – 5 points)
III: Oral presentations on Zoom (week 16 – 10 points)
IV: Final submission (due Dec 15 – 20 points)
Data sets for the final project
It is ideal if you have/develop your own idea for a data set and analysis — for example, that way you may do something that’s directly useful for your own research.
If not, I can provide you with this.
More information about the final project will follow later in the course.
TBA
Weekly readings
Weekly exercises — I recommend doing these on Fridays after the week’s session.
Miscellaneous small assignments such as surveys and account setup.
Weekly materials & homework
I will try add the materials for each week on the preceding Friday — at the least the week’s overview and readings.
None of this homework had to be handed in.
We will have an optional but highly recommended weekly recitation meeting on Mondays, during which we go over the exercises for the preceding week.
Practice is key!
This course is intended to be highly practical, and if you don’t practice the skills we will focus on by yourself, you will not get much out of it.
Please indicate your availability here: TBA